11 research outputs found

    Metagenomics Binning Using Assembly Graphs

    Get PDF
    Metagenomics involves the study of various genetic material obtained directly from communities of microorganisms living in natural environments. The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. Recent developments in high-throughput sequencing technologies have enabled metagenomics to analyse samples from environments, without having to rely on culture-based methods. Once an environmental sample is sequenced, a process called metagenomics binning is used to cluster the sequences into bins that represent different taxonomic groups such as species, genera or higher levels. Various efforts have been made throughout the past to bin metagenomic sequences. One approach followed is to bin raw sequencing reads prior to assembly. However, reads are considered too short to produce accurate and reliable binning results for downstream analysis. Hence, the standard approach followed during metagenomics analysis is to assemble short reads into longer sequences called contigs and then bin these resulting contigs. Existing metagenomic contig-binning methods rely on the composition and abundance information of the contigs, and face challenges when binning short contigs and contigs with similar composition and abundance. Contigs are derived from the underlying assembly graph which contains valuable connectivity information among contigs. However, existing metagenomic contig-binning methods do not consider the assembly graph in the binning process. Firstly, this thesis describes a bin refinement tool named GraphBin that improves existing metagenomic binning results using assembly graphs. GraphBin makes use of the assembly graph and a label propagation method to refine binning results of existing contig-binning tools by correcting mis-binned contigs and recovering short contigs that are discarded. Secondly, this thesis explains how to enable the detection of shared sequences among multiple species from assembly graphs and introduces a tool named GraphBin2 which can perform overlapped binning. GraphBin2 makes use of the assembly graph and the coverage information of contigs which enables the detection of contigs that may belong to multiple species. Thirdly, this thesis introduces a stand-alone approach named MetaCoAG to bridge metagenomics binning and assembly by incorporating composition, coverage and assembly graphs. MetaCoAG uses single-copy marker genes to estimate the number of initial bins, assigns contigs into bins iteratively and adjusts the number of bins dynamically throughout the binning process. In summary, this thesis discusses the challenges in binning metagenomic contigs, the shortcomings of existing metagenomic contig-binning tools and presents how the assembly graph can be incorporated to improve metagenomics binning

    GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs

    Get PDF
    Metagenomic sequencing allows us to study structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for contig binning only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins

    The human gut virome: composition, colonization, interactions, and impacts on human health

    Get PDF
    The gut virome is an incredibly complex part of the gut ecosystem. Gut viruses play a role in many disease states, but it is unknown to what extent the gut virome impacts everyday human health. New experimental and bioinformatic approaches are required to address this knowledge gap. Gut virome colonization begins at birth and is considered unique and stable in adulthood. The stable virome is highly specific to each individual and is modulated by varying factors such as age, diet, disease state, and use of antibiotics. The gut virome primarily comprises bacteriophages, predominantly order Crassvirales, also referred to as crAss-like phages, in industrialized populations and other Caudoviricetes (formerly Caudovirales). The stability of the virome’s regular constituents is disrupted by disease. Transferring the fecal microbiome, including its viruses, from a healthy individual can restore the functionality of the gut. It can alleviate symptoms of chronic illnesses such as colitis caused by Clostridiodes difficile. Investigation of the virome is a relatively novel field, with new genetic sequences being published at an increasing rate. A large percentage of unknown sequences, termed ‘viral dark matter’, is one of the significant challenges facing virologists and bioinformaticians. To address this challenge, strategies include mining publicly available viral datasets, untargeted metagenomic approaches, and utilizing cutting-edge bioinformatic tools to quantify and classify viral species. Here, we review the literature surrounding the gut virome, its establishment, its impact on human health, the methods used to investigate it, and the viral dark matter veiling our understanding of the gut virome

    Deep HPM TD (Co-assembly) dataset and binning results

    No full text
    Deep HPM TD dataset of the human metagenome sample from tongue dorsum of a participant from the Deep WGS HMP clinical samples (Lloyd-Price et al., 2017). Contains the following files. - Contigs file - Paths file (metaSPAdes) - Assembly graph file (metaSPAdes) - Abundance file - Binning results (including CheckM results)</p

    Experiential Learning in Bioinformatics – Learner Support for Complex Workflow Modelling and Analysis

    Get PDF
    Bioinformatics is focused on deriving biological understanding from large amounts of data with specialized skills and computational tools. Students, who wish to pursue a career as a bioinformatician, are required to have a good understanding in biology and computer science. One of the challenging areas for a student learning in bioinformatics is complex workflow modelling and analysis; it incorporates several threshold concepts and liminal spaces for student learning, which demands higher levels of cognitive skills, active exploration and reflective reinforcement in student learning. Hence, proper learning material and interactive tools are required to support student learning through active exploration and experiential learning. The study presents the successful use of such a learner support tool, BioWorkflow [1], we developed to be used in bioinformatics teaching and research. An evaluation was done with a student sample (n=80), where the first group (n1=40) was given only the relevant course material and the second group (n2=40) was given the course material along with BioWorkflow to visualize concepts relevant to sequence alignment and workflow modelling. Better learning engagement during the experiment, better performance at advanced questions and a positive user response were observed from the students who used BioWorkflow tool, compared to the control group. Student feedback strongly supported the fact that tools similar to BioWorkflow are an essential element for enhancing teaching and learner support in bioinformatics; students appreciated the tool usability and its help obtained for scoring high grades at the assessment

    linsalrob/spae: 1.0

    No full text
    Assembly and annotation steps for phage genomes sequenced on Illumina/MGI sequencing platforms, or Oxford Nanopore sequencing platforms

    Change Detection and Notification of Web Pages: A Survey

    No full text
    The majority of currently available webpages are dynamic in nature and are changing frequently. New content gets added to webpages, and existing content gets updated or deleted. Hence, people find it useful to be alert for changes in webpages that contain information that is of value to them. In the current context, keeping track of these webpages and getting alerts about different changes have become significantly challenging. Change Detection and Notification (CDN) systems were introduced to automate this monitoring process and to notify users when changes occur in webpages. This survey classifies and analyzes different aspects of CDN systems and different techniques used for each aspect. Furthermore, the survey highlights the current challenges and areas of improvement present within the field of research

    Phylogenetic Tree Construction Using K-Mer Forest- Based Distance Calculation

    No full text
    Phylogenetics is one of the dominant data engineering research disciplines based on biological information. More particularly here, we consider raw DNA sequences and do comparative analysis in order to come up with important conclusions. When representing evolutionary relationships among different organisms in a concise manner, the phylogenetic tree helps significantly. When constructing phylogenetic trees, the elementary step is to calculate the genetic distance among species. Alignment-based sequencing and alignment-free sequencing are the two main distance computation methods that are used to find genetic relatedness of different species. In this paper we propose a novel alignment-free, pairwise, distance calculation method based on k-mers and a state of art machine learning-based phylogenetic tree construction mechanism. With the proposed approach we can convert longer DNA sequences into compendious k-mer forests which gear up the efficiency of comparison. Later we construct the phylogenetic tree based on calculated distances with the help of an algorithm build upon k-medoid clustering, which guaranteed significant efficiency and accuracy compared to traditional phylogenetic tree construction methods
    corecore